Bioinformatic and Statistical Analysis of Microbiome Data From Raw Sequences to Advanced Modeling with QIIME 2 and R (Yinglin Xia, Jun Sun)

Index

Abundance based coverage estimator (ACE)290–298, 330, 572, 573

Adaptive Gauss-Hermite quadrature (AGQ)593–595, 608, 647, 659, 660

Additive log-ratio (alr)496–498, 519, 551, 690

adonis()409, 410, 412, 418, 426, 685, 689

adonis2()409–414, 416

aGLMM-MiRKAT676, 677

AICtab ()633, 635

Aitchison simplex491, 493, 496–498, 551

Akaike information criterion (AIC)562, 575, 576, 600–605, 608, 618–620, 633–635, 640, 642–646, 651, 652, 658, 660, 670

ALDEx2491, 498, 500–518, 520, 526, 527, 549, 551

Alpha diversity2, 5, 47, 135, 136, 154, 156, 257, 272, 289–330, 336, 338, 557, 572, 576

Alpha-phylogenetic method326

ampvis2 package293–296, 300, 303, 308, 312, 319, 352–353, 355, 357, 388, 572

Analysis of composition of microbiomes (ANCOM)491, 518–528, 532, 547–549, 551

Analysis of composition of microbiomes-bias correction (ANCOM-BC)491, 518, 528–549, 551

Analysis of similarity (ANOSIM)236, 341, 397–405, 409, 417, 418, 429, 431

ANCOMBC package532–547

Ape11, 29, 48–53, 75, 346, 352, 356

Artifacts2–5, 8, 65, 68, 69, 73, 76–79, 87–90, 92, 100–101, 104, 107, 110, 114, 116, 117, 127, 132, 133, 136–138, 149–151, 153, 154, 156, 230, 233, 323, 371, 382, 429, 465, 492, 520, 522, 523, 580, 689

Autoregressive of order 1[AR(1)]568, 597, 663, 667

Average-linkage clustering162, 174, 177, 196–198, 214, 215, 218, 230, 238, 239, 259

Bayesian information criterion (BIC)562, 601, 603, 605, 608, 618, 619, 633–635, 640, 642–646, 651, 652, 658, 660, 670

bbmle package633, 651

Benjamini–Hochberg (BH) procedure503, 511, 531, 540

betadisper()418–420

Beta diversity2, 43, 47, 152, 154, 209, 210, 261, 271, 289, 290, 322, 323, 330, 335–348, 371, 381–384, 388, 397–431, 441, 520, 557, 580

BICtab()633

Biological classifications174–176, 209, 218

BIOM format47, 54–56, 61, 74, 92, 134, 294

Bland-Altman plot508, 509, 514

Bonferroni correction method540

Box plots18–21, 102, 112, 113, 115

Bray-Curtis distance86, 87, 382, 383, 406, 409, 418, 429–430

Bray-Curtis index336–340

Bridge criterion (BC)251, 532, 603–604, 608

calcNormFactors()460, 466

Canonical correspondence analysis (CCA)351–353, 355–357, 359, 363, 368, 377–381, 385, 388, 398

Castor11, 48, 52–54, 75

Centered log-ratio (clr)45, 344, 496, 497, 505, 506, 518, 551, 690

Chao 1290–298, 325, 330, 572, 573, 665

Characters12, 15, 16, 27, 49, 52, 53, 67, 69–71, 97, 137, 231, 236, 245–247, 267, 351, 360, 386, 443, 444, 512, 533, 681

Closed-reference clustering149–152, 154–156

Cluster-free filtering (CFF)149–152, 154–157, 161, 254, 259–261, 269, 271, 273

Clustering2, 8, 103, 104, 119, 123, 125, 140, 147–157, 161, 162, 164, 176, 177, 191, 193–200, 202–204, 208–210, 214–218, 228–235, 237–241, 248–255, 257, 258, 260–269, 335, 345, 384–388, 398, 404, 449, 616, 677, 678

Clustering-based OTU methods161, 162, 209–218, 227–250, 252–254, 261, 273

Commonly clustering-based OTU methods161, 177, 192–207, 215, 216, 218, 273

Complete linkage clustering196–198, 200, 214, 229, 230, 259

Compositional data491–500, 505, 548–551, 677, 690, 691

Conway-Maxwell-Poisson distribution621, 623, 630, 646

Correlated sequence Kernel association test (cSKAT)676

Correlation matrix140, 180, 188, 451, 568, 588, 597, 663, 664, 679, 680, 684, 685, 688

Correspondence analysis (CA)351, 355, 366–372, 377, 387

CSV25, 46, 47, 97, 327, 430, 474

cumNormMat()459–460, 466

Cumulative sum scaling (CSS)436, 438, 440, 441, 454, 465–467, 479

curatedMetagenomicData24, 48

DADA22, 95, 103–108, 113, 114, 116–119, 124, 125, 128–132, 141, 148, 149, 156, 254, 256–259, 261–266, 268–270, 273, 296, 520

Deblur2, 95, 99, 103, 104, 108, 113–119, 148–150, 254, 256–259, 262, 264–265, 269, 270, 273

Demultiplexed71–73, 78, 95–108, 112–119, 149

Demultiplexed paired-end FASTQ data78, 95–108, 119

Denoising-based methods250, 254, 256–267, 273

De novo clustering153–156, 230, 234, 253

Density plots22

DESeq465, 517

Detrended correspondence analysis (DCA)351, 352, 356, 371–374, 378, 385, 388

Deviance information criterion (DIC)604, 608

devtools package649

DHARMa package640, 670

Difference plot508

Discovery odds ratio testing455, 466

Discriminant analysis193, 194, 203–207, 216–218

dist()401, 419

Distance matrix4, 52, 79, 86–87, 92, 135, 203, 209, 229, 235, 345, 347, 348, 351, 352, 360, 399, 407, 409, 424, 429, 680, 681, 689

Distribution-based clustering (DBC)252–254, 260, 273

download.file()15

Ecological similarity236, 240, 260, 261

edgeR465, 488, 517, 518, 548, 549, 616

effectPlotData ()648

Effect size168, 400, 465, 503–512, 516, 547, 606, 607

EM-IWLS algorithm592, 598, 660, 662–664, 670

Emperor plots382–384, 388, 428, 520, 521, 690

Entropy-based methods255

Eukaryote species242, 243

exportMat()461, 467

exportStats()461, 467

EzBioCloud124

Factor analysis162, 168, 180, 181, 193, 195, 201, 202, 209, 218, 349, 350

False positive rate (FDR)140, 423, 437, 465, 466, 477, 494, 518, 526, 527, 531, 532, 540, 547–549

FASTA1, 3, 66–69, 71, 92, 128, 129

FASTQ66, 69–73, 78, 92, 95–109, 113–119

FastTree4, 135, 137, 138

Fast zero-inflated negative binomial mixed modeling (FZINBMM)598, 616, 660–669, 671, 675

Feature correlations441, 455–456, 466

Feature table5, 8, 15, 25, 66, 73–76, 78–83, 85–88, 90–92, 95–119, 123, 133, 134, 147, 150, 151, 153, 269, 323, 324, 330, 442, 520–523, 528, 533, 578, 688

Filter41, 58, 79–87, 104, 105, 116, 118, 241, 294, 309, 353, 356, 448, 458, 466, 476, 485, 521, 625, 650, 683

Finite-sample corrected AIC (AIC_c)601–603, 608, 620

fitZIBB()483, 484, 487

Fixed effect558–563, 568, 571, 573, 575–579, 589, 590, 592, 606, 607, 617, 622, 623, 627, 647, 648, 650, 658–661, 664, 665, 667

Frobenius norm680, 681, 685, 688, 689

F test205, 209, 410, 571, 574, 620

Functional analysis271–273, 567

Gauss-Hermite quadrature (GHQ)593, 594, 608, 621

Generalized information criterion (GIC_λ)604, 608

Generalized linear mixed models (GLMMs)560, 562, 571, 582, 587–608, 615–622, 624–659, 662, 663, 669–671, 675, 676

Generalized linear models (GLMs)440, 471, 482, 515, 528, 562, 563, 587–589, 596, 597, 599, 600, 608, 616, 617, 621, 622, 646, 664

Generalized nonlinear models (GNLMs)587, 588, 608

Generalized UniFrac135, 336, 344, 347

Genome Taxonomy Database (GTDB)128, 130

ggpubr17–23, 313, 314, 316

GLMMadaptive package598, 616, 617, 647–660

GLMM-MiRKAT676, 677

glmmTMB package598, 616, 621–647

glmm.zinb()664, 666

glmPQL()664

Greengenes116, 119, 124, 125, 127–129, 132, 151, 155, 228, 256, 257

Heuristic clustering OTU methods229–231, 273

Hierarchical clustering OTU methods229–230

Histogram plots22–23

HITdb130

Holm-Bonferroni531

Homogeneity102, 174, 177, 191, 198, 210, 366, 397, 418–422, 431, 618, 639, 641

Hypothesis tests163, 208, 230, 397, 410, 484, 560–561, 618

Information criteria562, 600–605, 608, 618, 646, 670

Inter-quartile log-ratio (iqlr)496, 498, 500, 502, 503, 506, 513, 514, 518, 551

Inverse Simpson diversity303–304

Isometric log-ratio (ilr)496–498, 500, 551, 691

Iterative weighted least squares (IWLS) algorithm591, 592, 596–598, 608, 660, 663

Jaccard distance344, 409, 418, 429, 430

Jaccard index322, 336, 339–340

Keemei98, 109

Kendall correlation method688

Kenward-Roger (KR) approximation571

KRmodcomp()571

Kruskal-Wallis test318–321, 330, 465, 501, 518, 576

Laplace approximation592–595, 598, 608, 621, 660

Large P small N problem349, 675, 692–693

Library sizes437, 460, 461, 466, 481, 492, 493, 497, 501, 528, 529, 533, 549, 564

libSize()460, 466

Likelihood-based methods591, 593–595, 608, 646

Likelihood ratio test (LRT)470, 560–562, 575, 605–608, 619, 651, 670

Linear mixed-effects models (LMMs)551, 561, 565, 577, 581, 582, 587, 589–592, 594, 598, 615, 620, 660, 663–665, 667, 669, 671, 675–677, 692

list.files()96

lme()568, 571, 664–666

lme4 package571, 573, 575, 621, 646

LmerTest package557, 570–576, 582

load()14–15, 356, 572

Log-normal permutation test453–454

Log-ratio transformations344, 491, 496–502, 505, 518, 522, 526, 527, 548, 549, 551, 690

Machine learning3, 128, 599–600, 608

MAFFT136

marginal_coefs ()648

Marginalized two-part beta regression (MTPBR)470

Markov Chain Monte Carlo (MCMC)-based integration591, 592, 595, 596, 608

MASS package664

Maximum likelihood (ML)52, 137, 472, 482, 562, 563, 590, 591, 593, 594, 596, 597, 600, 606, 616, 619, 621, 647, 659, 664

metagenomeSeq48, 74, 431, 436, 438, 440–466, 479, 548, 549

Microbiome package11, 34–36, 44, 46, 47, 61, 293, 296–298, 300, 303–305, 307, 312–316, 348, 377, 411, 426–429, 431, 572

Mothur2, 47, 74, 104, 124, 177, 216, 228, 230, 256–258, 263, 265, 294

MRcounts()456, 459–460, 466

MRexperiment Object441, 443–446, 448, 454–459, 461–464, 466

Multi-omics integration271, 273

Multi-omics methods677–678, 693

Multiplexed paired-end FASTQ data95, 108–112, 119

Multivariate analysis207, 244, 342, 347, 350, 351, 371, 384, 397, 398, 404, 405, 409, 417–419, 431, 557, 678, 692, 693

Multivariate distance/Kernel-based longitudinal models676–677, 693

Multivariate longitudinal microbiome analysis675–678

National Center for Biotechnology Information (NCBI)124, 125, 130, 139, 256

Natural classification166, 169, 171–172, 175, 212, 215, 218

NBZIMM package616, 664–668

Negative binomial mixed models (NBMMs)596–598, 650, 651, 669

Negative binomial (NB) model616

Newick tree format53

nifHdada2131

nlme package563–569, 582

Non-metric multidimensional scaling (NMDS)202, 203, 351, 352, 355–357, 362–366, 385–388, 399, 678

Nonparametric MANOVA398, 405

Non-parametric microbial interdependence test (NMIT)671, 675, 678–693

Normalization factors438, 439, 444, 445, 447, 460, 466

Normalized counts260, 437–439, 449, 456, 459–461, 466

Numerical integration591–595, 598, 608, 659, 668, 670

Numerical taxonomy140, 161–218, 227–231, 236, 239, 240, 244, 246, 260, 267, 349–351, 360, 385–387

Oligotyping254, 255, 260, 269

Open-reference clustering149, 154–157, 161

Operational taxonomic units (OTUs)8, 25–30, 32, 36, 37, 40, 44, 45, 47, 54, 56, 57, 74, 91, 95, 103, 104, 107, 116, 118, 119, 123, 125, 126, 128, 132, 141, 147–157, 161–218, 227–241, 246–273, 291–295, 322, 325, 328, 345, 348, 350–356, 360, 364, 366, 368, 370, 378, 385, 386, 402, 408, 428, 435, 437–440, 442, 444, 447–449, 458, 461–462, 465, 467, 481–485, 487, 489, 493, 494, 500, 501, 506, 508, 511–514, 517–519, 528–530, 535, 548, 549, 564, 677, 681, 682, 684, 692

Ordination47, 154, 162, 176, 193, 194, 202, 209, 210, 216, 217, 335, 336, 339, 345, 349–353, 355, 357–379, 381–388, 397–399, 404–406, 431, 453, 678

Ordination methods177, 195, 202–203, 218, 330, 335, 349–381, 385–388, 398, 404, 677

Over-dispersed211, 387, 549, 597, 598, 602, 615–620, 629, 646, 660, 661, 669, 670

Over-dispersion379, 435, 437, 480–484, 582, 596, 597, 602, 607, 616, 620, 630, 636, 640, 652, 670, 671

p.adjust()423, 533

pairwise.perm.manova()423

pbkrtest571

p-corr-method688, 689

Pearson correlation method687

Penalized quasi-likelihood (PQL)592–594, 596, 598, 663

Penalized quasi-likelihood-based methods591–593, 608

Permutational MANOVA (PERMANOVA)341, 345, 397, 400, 405–420, 422–426, 428, 429, 431, 689

Permutation invariance495

Permutation tests208, 210, 398–400, 405, 409, 417, 419, 420, 423, 431, 448, 453–454, 466, 680

Perturbation invariance496

Phenetics162, 164–172, 174–178, 188, 194, 201, 202, 210–213, 215–218, 241, 243, 244, 246, 247, 350, 351, 385, 386

Phenetic taxonomy169–176, 202, 216

Phylogenetic diversity135, 136, 138, 249, 305, 306, 322, 326, 327, 330

Phylogenetic entropy305–307, 330

Phylogenetic quadratic entropy305, 307, 330

Phylogenetics2, 11, 25, 48–54, 61, 125, 136, 139, 140, 162–164, 170, 173–176, 212, 214, 232, 236, 237, 243, 245–249, 251, 261, 267, 290, 305–307, 322, 324, 326, 328, 335, 336, 342–348, 360

Phylogenetic trees4, 7, 8, 25, 29, 36, 41, 42, 48–53, 66, 74–77, 92, 119, 123–141, 234, 290, 305–307, 322, 326, 328, 343, 345, 356, 533

Phyloseq object24–31, 33, 34, 36, 46, 48, 58, 59, 88, 89, 297, 316, 345, 346, 411, 534

Phyloseq package24, 25, 30, 34, 35, 58, 74, 88, 179, 297, 298, 345

Physiological characteristics248–249

Phytools11, 48, 50–52

Pielou’s evenness304–305, 326, 327, 330

pldist344–345

plotQQunif ()637

plotResiduals ()638

Plugins2–5, 8, 65, 80, 83, 85, 86, 95, 99, 103–112, 114–116, 119, 128, 133, 136, 148–150, 153, 155, 261, 322, 381, 428, 520, 692

Poisson168, 211, 260, 262, 437, 501, 528, 588, 589, 593, 676

pr2database131

Presence absence testing454–455, 466

Principal component analysis (PCA)154, 202, 351–360, 363, 364, 367, 368, 370, 375, 378, 385–388, 452, 453, 499, 678, 693

Principal coordinate analysis (PCoA)154, 202, 207, 323, 351, 352, 355–357, 359–364, 382–388, 421, 422, 428, 678, 689, 690

Prokaryote/bacterial species242

pscl package616

Pseudo-likelihood (PL)592, 594

Pyrosequencing flowgrams258–259, 273

q2-composition5, 520

q2-cutadapt5, 110, 111, 148

q2-data22

q2-deblur plugin95, 113–119

q2-feature-classifier5, 126, 128–135, 141

q2-feature-table2, 5, 80, 86

QIIME1–4, 7, 65–67, 92, 100, 104, 124–127, 136, 138, 155, 156, 228, 231, 232, 234, 256, 257, 263, 270, 293, 323

QIIME 21–8, 11, 15, 48, 65–92, 95–101, 103, 104, 106, 108–110, 113–116, 119, 124–128, 132, 133, 136, 137, 141, 147–149, 151, 153, 157, 256, 257, 265, 270, 290, 293, 294, 296, 306, 322–330, 335, 336, 381–384, 388, 397, 428–431, 520–526, 551, 557, 576–582, 675, 687–690, 692, 693

QIIME 2 archives8, 77–79, 92

qiime composition ancom520, 523, 526

qiime longitudinal linear-mixed-effects576–577, 579, 582

qiime longitudinal nmit688

qiime2R package15, 87–90, 92, 296

QIIME 2 view3, 87, 92, 323, 581

Qiime zipped artifacts (.qza)3, 4, 7, 8, 65, 76, 77, 87, 88, 100–101, 110, 119, 132, 151, 296, 324

QQ-plot494, 637, 638, 640, 653–657

q-score114, 116, 149

q-score-joined116, 149

Q-technique167, 180–182

q2-types2, 5

Quality filter114, 116, 132, 149

Quality of the reads112–113

Quasi Akaike Information Criterion and Corrected Quasi-AIC (QAIC and QAICc)602

q2-vsearch5, 115, 148–150, 157

.qza file88

Random effect470, 503, 558–564, 574, 575, 577–579, 582, 589–598, 601, 606, 607, 676

Rarefaction108, 154, 323, 328–330, 348, 437, 646

Raw sequence data1–4, 78, 96, 99, 109–110, 119

read.csv()12, 57, 443

read.csv2()12, 443

read.delim()12, 442

readr16–17, 46, 60

readRDS()14

read.table()12, 443

Redundancy analysis (RDA)351–353, 355–357, 359, 363, 374–378, 385, 388

Reference databases123–129, 132, 135, 141, 150–153, 155, 228, 256, 257, 293

Restricted maximum likelihood (REML)563, 574, 606, 619, 621

Ribosomal Database Project (RDP)74, 124, 128, 130, 228–230, 256

R-technique180–182

RVAideMemoire package423–426, 431

Sample metadata1, 25, 36, 56, 79, 86, 87, 96–99, 109, 111, 112, 119, 133, 134, 327, 328, 330, 351, 412, 429, 474, 506, 520, 533, 534, 545, 578, 689

Sample size calculation166–168, 218

Satterthwaite approximation571, 573–575, 620

save()13, 14

save.image()14

saveRDS()12–13

Scaling invariance495

SeekDeep254, 255, 259, 263–266, 268, 273

Semantic type4, 8, 68, 69, 73

Semi continuous469, 470, 648

seqkit68, 96

Sequence similarity228, 231, 236, 237, 240, 241, 247, 248, 260, 261, 267, 271, 273

Sequencing error81, 95, 103, 118, 154, 230, 235, 240–241, 251–253, 256, 259, 260, 266, 329, 330

Shannon diversity43, 44, 298–300, 576–581

SILVA124, 127, 130, 151, 228, 257

Similarity coefficients162, 167, 169, 176, 182–194, 196–198, 200, 208–210, 212, 213, 216, 218, 336, 339, 340, 351, 354

Similarity/resemblance matrix193–194

Simpson diversity295, 298, 300–304

Simpson evenness303–304

Single linkage clustering162, 196–197, 200, 214, 215, 234, 238, 252

Single-nucleotide resolution-based OTU methods250–254

Sørensen index336, 340–341

Spearman correlation method140, 494

Species and species-level analysis227, 241–249, 273

16S rRNA method246–249

stats package664

Subcompositional coherence495

Subcompositional dominance496

Sub-OTU methods252, 269–271, 273

Swarm2252–254, 269, 273

Taxa5, 25, 84, 119, 134, 236, 342, 472, 494, 566, 650, 677

Taxonomic classification36, 56, 103, 123–125, 132, 133, 135, 141, 212, 213, 236, 267

Taxonomic rank28, 124, 169–170, 213, 218, 231, 236, 535

Taxonomic resemblance161, 178–192, 218

Taxonomic structure162, 164, 168, 176–207, 209, 210, 214, 216, 218, 385, 387

Taxonomy2, 75, 119, 123, 228, 535, 626, 688

Taylor-series linearization591–593, 608

Template model builder (TMB) package621

testDispersion()638

testZeroinflation()638, 640

tidyverse23–24, 58–60

Total sum scaling (TSS)436–438, 465, 466, 479

.tsv file296, 324, 430

Tukey mean-difference plot508

TukeyHSD()418

UCLUST155, 228, 231, 232, 238, 253, 265, 269

UNITE124, 127, 128, 151

Univariate analysis557, 678, 692, 693

UNOISE2254, 257, 259, 263–266, 268, 273

UNOISE3256–258, 263–264, 273

Unweighted UniFrac135, 136, 322, 336, 342, 345, 347, 348, 409, 431

Unweighted UniFrac distance343, 344, 348, 384, 409, 429, 430

USEARCH2, 74, 149, 155, 228, 231–234, 257, 264–266, 293

vegan package24, 47, 179, 336, 339–342, 348, 352, 356, 401–404, 409–419, 453, 683, 685, 689

vegdist()193, 336, 338–341, 356, 401, 410, 418, 419

Violin plots20–21, 316–318, 330

Visualizations3–5, 7, 8, 17, 56, 61, 87, 101, 105, 106, 108, 109, 112, 119, 133, 134, 322, 323, 349, 353, 354, 381, 382, 428, 441, 449, 521, 579, 581, 679, 689, 692

Volatility analysis579–582

VSEARCH128, 149–151, 153, 157, 234

Vuong test607, 608, 618, 619

Wald t test619, 620

Weighted UniFrac135, 258, 322, 336, 343, 345, 347, 409, 429, 431

Weighted UniFrac distance136, 336, 343–345, 347, 429, 431

write.table()12

xtable()511

xtable package511

Zero-hurdle607, 616, 619, 624, 635, 643, 646, 659, 669

Zero-hurdle negative binomial (ZHNB)616, 618, 632, 643, 651, 669, 670

Zero-hurdle Poisson (ZHP)616, 618, 619, 632, 635, 643, 651, 658, 659, 670

Zero-inflated211, 387, 436, 439, 440, 470, 479, 480, 548, 592, 598, 607, 616–622, 626, 628–630, 633, 635, 646, 660, 665, 669, 670

Zero-inflated beta467, 469–489

Zero-inflated beta-binomial model (ZIBB)469, 470, 480–489

Zero-inflated beta regression (ZIBSeq)469–480, 489

Zero-inflated beta regression model with random effects (ZIBR)470

Zero-inflated continuous469, 470

Zero-inflated Gaussian (ZIG)435–441, 465–467, 479, 480, 548, 549, 669

Zero-inflated generalized linear mixed models (ZIGLMMs)599, 600, 617

Zero-inflated log-normal (ZILN)435–453, 465–467

Zero-inflated negative binomial (ZINB)479, 480, 488, 616, 618, 622, 630, 631, 641, 645, 648, 650–652, 656–661, 669, 670

Zero-inflated negative binomial mixed models (ZINBMMs)596, 598, 660–664, 668, 670

Zero-inflated Poisson (ZIP)479, 480, 616, 618, 619, 629, 636, 648, 650–652

ZIBBSeqDiscovery483, 484

zicmp630, 633–635, 642, 644, 646